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ABSTRAC" 

Large scale surveys usually employ a complex sampling 
design and as a consequence, no standard methods for estimation of 
the standard errors associated with the estimates of population means 
are available. Resampling methods, such as jackknife or bootstrap, 
are often used, with reference to their properties of robustness and 
rreduction of bias. A method based on variance component models is 
proposed as an alternative to the jackknife procedure used for 
calculation of the standard errors for the subpopulation means of 
proficiency scores in a large scale survey of education in the United 
States. A simulation study provides evidence that the jackknife 
estimcilor for the standard error of the estimate of the mean is 
substantially less efficient than its variance component counterpart. 
The ultimate decision to use variance component methods should be 
based on the predicted (guessed) impact of the features of the data 
not accounted for by the variance component models. An appendix 
contains the scoring algorithm. Six tables present analysis results. 
(Contains seven references.) (Author/SLD) 
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Abstract 

Large scale surveys usually employ a complex sampling design, and as a 
consequence no standard methods for estimation of the standard errors associated with 
the estimates of population means are available. Resampling methods, such as jackknife 
or bootstrap are often used, with reference to their properties of robustness and reduction 
of bias. We examine a method based on variance component models as an alternative 
to the jackknife procedure used for calculation of the standard errors for the 
subpopulation means of proficiency scores in a large scale survey of education in the 



Keywords: Efficiency; Jackknife; Sampling design; Standard errors; Variance 
components. 
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1. Background and motivation 

The National Assessment of Educational Progress (NAEP) is a large scale survey 
of U.S. primary and secondary schools. It employs a stratified three-stage clustered 
sampling design for students in various age/grade groups, and a complex partially 
balanced incomplete block design for the administered items. The item administration 
design enables collecting information about a large number of items without 
administering each item to every individual in the sample. The questionnaire items are 
divided into content areas (academic subjects) and, within subjects, into attitude and 
cognitive items. A common block of background items is administered to all the 
individuals. 

For each content area an underlying proficiency (ability) scale is defined, and the 
scores on this scale are estimated from the responses to the cognitive items for all the 
students in the sample who have been administered at least one block of items from the 
content area. The proficiency scale is defined in such a way as to have, theoretically, the 
normal distribution with mean 250 and standard deviation 50. Each item has a limited 
number of response options, and for each cognitive item one response is correct. Results 
of the survey are published in the form of ^Summary Tables* which contain the sample 
(weighted) means of these proficiency scores, and the estimated standard errors for these 
means, for each combination of attitude item and response to it, cross-classified by the 
demographic background variables. 

For example, the 1983-84 survey of 13-year-olds used a sample of approximately 
31,000 students, each of whom was administered at least one of the 13 blocks of items 
pertaining to reading skills. For example, one of these heading* blocks (block N) was 
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administered to 3,078 students. To the attitude item No. 4 of this block 2,139 students 
(approximately 70%) chose the response option A. One entry of the 'Sunmiary Tables' 
contains an estimate of the mean proficiency of these students, and an estimate of the 
associated sampling variance. Most attitude it:;ms have five response options, and so the 
estimate of a typical entry in the Summary Tables is based on a small proportion of the 
total sample. 

The sampling design involves 32 strata, within each of which a pair of primary 
sampling units (PSU's) is selected, with replacement. Schools are sampled within each 
selected PSU, and students are sampled within each selected school. The sampling 
procedures at each stage (PSU, school, student) are conditionally independent, given 
selection of the units at the higher level of aggregation. The (conditional) sampling 
probabilities are unequal, so as to oversample certain minority groups. The a priori 
(base) sampling weights were adjusted after the sampling procedure for non-response, 
and extremely large weights were trimmed so as to reduce the influence of the associated 
observations. Finally, the weights were adjusted by a process called poststraJification to 
conform to certain population totals. We refer to these adjusted weights as poststratified 
weights. 

Let Y^uK be the score of student K in school J within primary sampling unit 
(PSU) I of the stratum h. The population mean is defined as 

Y = 5;uKY,uK/J;uNhu. (1) 
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where is the number of students (of a particular age, or in a given grade) in the 
school (h, I, J), The mean for a subpopulation is defined similarly to (1), with N^y 
replaced by the counts of students belonging to the subpopulation, within schools. 

In NAEP the traditional ratio estimator for the (sub-) population means is used: 

y = '^ik yhijkWhijk / ^.ijk Whijk, (2) 

where w^jj^ are the poststratified weights, and the summations are over all the students 
in the sample and (if applicable) in the subpopulation. The sampling variance associated 
with this estimator is estimated by a jackknife method: For each stratum h = 1, H 
the h-pseudosample is created from the original sample by replacing the data for the first 
PSU in the stratum with the data from the other PSU in the stratum. The jackknife 
estimator of the sampling variance for the ratio estimator (2) is defined as the corrected 
sum of squares of the pseudosample means: 

^-^(y,-W (3) 

where y^ is the weighted mean for the h-pseudosample, with its weights adjusted for non- 
response in this pseudosample. The ratio estimator (2) itself is not jackknifed since it is 
believed to have satisfactory properties. 

The jackknife procedures are computationally very extensive and cumbersome 
because they require calculation of the sampling weights adjusted for each jackknife 



pseudoanalysis associated with a stratum. In the case of NAEP the jackknife estimates 
of the standard errors for the population means, and for certain subpopulation means in 
particular, are known to have very poor sampling properties (Johnson, 1988). 

Ssiveral researchers have proposed model-based estimation procedures for data 
from surveys that involve hierarchies (Malec & Sedransk, 1985, Aitkin & Longford 1986, 
and Battese, Harter & Fuller, 1988). The common feature of these variance component 
methods, considered from either a Bayesian or a likelihood prospective, is the modelling 
of the correlation structure of the observed data, or equivalently, the decomposition of 
the variation due to the levels of hierarchy induced by the sampling design. 

The selected clusters at each level of the nesting hierarchy are a random sample 
from the respective populations of clusters (PSU's, schools, students), and so it is natural 
to represent the individual proficiency scores by the variance component model 

yhijic = ah + \i + Chij + ehijk , (4) 

where the random terms b, c, e, form mutually independent samples from the normal 
distributions with means 0 and variances of , and , respectively. For the stratum 
means we consider two complementing assumptions: 

/ 

A. They are unknown constants (fixed between-stratum differences). 

B. They form a random sample from NQjl, a^) (random between-stratum 
differences). 



The strata are set prior to sampling, and so the assumption A. is more appropriate. The 
assumption B. is attractive in that the 32 parameters a^ are replaced by just two, ti and 
ci\. The original definition of the strata contains elements of arbitrariness, and that gives 
some credence to the assumption B. 

Jackknifc is a very general method, and it involves essentially no parametric 
assumptions. On the other hand, variance component methods are likely to be superior 
when the associated assumptions are satisfied, but in general they are much less robust 
than the jackknife procedures. The purpose of our study is to explore how and to what 
extent the jackknife procedures used in NAEP could be replaced by computationally 
more efficient methods, based on variance component analysis, that do not involve 
resampling. 

The paper is organized as follows: In Section 2 we describe the datasets used for 
the study and compare the results of the jackknife and variance component analyses. In 
Section 3 the performance of the jackknife analysis is compared with the variance 
component analysis by means of two simulation studies. Artificial data were generated 
according to the variance component model (4), in order to evaluate the extent of the 
largest possible loss of efficiency in the jackknife procedures. Technical details are given 
in Section 4 and in the Appendix. 

2. Data, procedures, and summary of results 

From the 1983-84 assessment of reading data for the 13-year-olds who had been 
administered the block N of reading items were extracted. The jackknife procedure 

J-U 
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used in the operation of NAEP was replicated and the variance component models (4), 
with the assumptions A. and B., fitted for the data corresponding to the students with a 
specific response to a selected attitude item. We discuss the results for a representative 
set of item-by-response combinations, given in Table 1, that were selected in such a 
manner as to cover the entire range of the proportions of students that occur in the 
Summary Tables (approximately 10% - 100%). 

TABLE 1 HERE 

For estimation of the parameters in the variance component model (4) a 
modification of the Fisher-scoring algorithm of Longford (1987), adapted for unequal 
sampling weights, was used. The jackknife and variance component estimates of the 
standard errors for the estimates of the means are given in Table 2. 

TABLE 2 HERE 

The standard errors for the means using the variance component model with the 
assumption of fixed stratum-differences (A.) are very close to the jackknife standard 
errors. The largest discrepancy occurs for the case of 'all students', where the variance 
component standard error is about 10% higher. Assuming random stratum-differences 
leads to substantial overestimates of the standard errors, almost 30% in the case of 'all 
students'. The two variance component models result in identical estimates of the 
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standard errors for the cases 6B (response B to the item 6) and 9C for which the estimate 
of the stratum-level variance is equal to 0. 

For variance component analysis we use the parametrization 



where = ^al/o\ is the square root of the ratio of the h-level and the student-level 

variances. Thus, the variance of an observation is equal to 0^(1 + + tI + r^). The 
estimates of the variance components are given in Table 3. These results indicate that 
the between-school variation (within-PSU, or level 2) accounts for between 10% (all 
students) and 20% (9C) of the total variation. The between-PSU (level 3) variance is 
substantially smaller. 

TABLE 3 HERE 

The estimates of the variances for the model assumptions A. and B. are identical 
except that the estimates for the between-stratum variation in model B., are replaced by 
the 31 stratum-contrasts for model A. 

The main implication of these results is that the standard errors of the 
(sub-)population means obtained from variance component model fits can be used 
iiistead of the computationally more intensive jackkn lfe procedures. The main advantage 
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of the variance component approach is that no resampling weights need to be calculated, 
and so the process of reporting of the results could be considerably streamlined. 

3. Simulations 

In this Section we discuss the issue of efficiency of the jackknife estimation method. 
For the model assumptions (4) the direct maximum likelihood method is asymptotically 
fully efficient, and so it is reasonable to assume that the relative efficiency of the 
jackknife and variance component methods for model (4) provides the most unfavorable 
comparison for the jackknife. 

3.1 Jackknife vs. variance components 

Data ^ets were generated according to the model (4) with the assumption of fixed 
stratum-differences. Since all the estimators of the variance components are translation 
invariant, our results are unaffected by the actual choice of the stratum-differences, and 
therefore they can be set identically to zero. In order to simplify the study further, we 
generated the following stratum/PSU/school design: For a given data set design (such 
as 4A, see Table 1) we generate a 'simulation' design by rounding the within-school totals 
of weights — these integers then represent the numbers of students within the schools in 
the simulation design. The design at the higher levels, i.e., the clustering of schools 
within PSU's and the pairs of PSU's within strata, is left intact. Equal 'simulation' 
weights are assigned to each observation. 
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We report only the results for the imputed values of the variance components 1. 
(students), 0.12 (schools), and 0.03 (PSU's); they are representative of the results for 
other realistic values of the variance components. The model (4) and all the estimators 
used are invariant with respect to linear transformations, and therefore the 
student-variance can be set to an arbitrary positive value and the population mean to any 
real value. The other two vai.imces are close to the values of the estimates in the real 
data. Two hundred replicates of the simulation datasets based on the data for all the 
students and for the combination 4A were generated. In order to informally confirm the 
generalizability of the results the variance components for schools in the range 0.04 - 0.20 
and for PSU's in the range 0.01 - 0.10 were also used. 

The results of the simulations indicate that the jackknife estimator of the mean does 
not provide any improvement over the arithmetic average, but the variance component 
estimator is appreciably more efficient. The variance component estimator for the 
standard error of the mean is far superior to its jackknife counterpart. The relevant 
results of the simulation study are summarized in Table 4. 

TABLE 4 HERE 

The Table contains three pairs of rows, corresponding to the arithmetic mean (i.e., 
assuming simple random sampling), the variance component method and the jackknife 
method. Within each pair the first row corresponds to the estimator of the mean and the 
second to the associated estimator of the standard error. The row 'VC.GM.' represents 
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the standard error of the arithmetic mean under the assumption of the variance 
component model. The bias of each estimator of the standard error for the mean can 
be assessed by comparing the mean of the model- oased estimates for the standard error 
with the sampling standard deviation of the corresponding estimates for the mean. For 
example, for the 'all students' dataset the sampling standard deviation of the ordinary 
mean (0.0371) is 1.87 times higher than the average square root of the mean square 
errors (.0198). The corresponding ratio for the dataset 4A is 1.67. 

The jackknife estimates of the mean are closer to the ordinary means than the 
variance component estimates. The jackknife estimator of the standard error for the 
mean has very little bias (compare mean of JK.SE. with the sampling standard deviation 
of JK.M.), and it also estimates the sampling standard deviation of the ordinary mean 
(G.M.) without any observable bias. The variance component estimator for the sampling 
standard deviation of the ordinary mean (VC.GM.) overestimates the sampling standard 
deviation of the ordinary mean by about 4%. 

The variance component estimator for the mean (V.C.M.) is appreciably more 
efficient than the jackknife estimator. Its sampling standard deviation is lower than the 
sampling standard deviation for the jackknife or the ordinary mean by 9% (all students) 
and 7% (4A). Note however, that the estimate of the sampling standard deviation for 
the variance component mean is biased (compare the mean of VC.SE with the sampling 
standard deviation for the V.C.M.), it has a positive bias of about 6% for both data sets. 

The sampling standard deviations for the variance component estimators are 
substantially smaller than their jackknife counterparts. The sampling standard deviations 
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for the VCSE* and VC.GM. are about 33% (all students) and 20% (4A) smaller than the 
corresponding values for JK.SE. 

We conclude that, contingent on appropriateness of the variance component model 
(4), estimation of the mean could be moderately improved (i.e., lower mean squared 
error) by application of variance component methods, but substantial improvement in the 
sampling properties of the estimates of the associated standard errors would result. The 
additional benefit would be that the resampling weights could be dispensed with. 

5.2 Lumpy data 

The Summary Tables contain a large number of entries related to subpopulations, 
such as minorities, which constitute only a small proportion of the target population, and 
they may be very unevenly distributed across the strata. The standard errors obtained 
by the jackknife procedures are subject to substantial sampling variation, and their 
estimation is probably very inefficient (Johnson, 1988). On the other hand the asymptotic 
properties of the maximum likelihood estimators using the variance component models 
may not hold for small datasets. 

In order to compare the jackknife and variance component methods for such *lumpy* 
data we have generated several data sets from the artificial dataset of *;Jill students* by the 
following two-stage sampling design: 
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1. For each stratum generate a value Pi from U(02, 0.5) (uniform distribution 
on the interval 0.2 - 0.5). Then, for the schools in the stratum, include the 
whole school in the dataset with (stratum-specific) probability py 

2. For the included schools: For each school generate a value pj from U{.1, .5), 
and include a student from the school in the dataset with (school-specific) 
probability pj. 

We discuss the results for three such datasets, contai.iing 368 students in 98 schools, 
329 students in 109 schools and 284 students in 85 schools, respectively. In these datasets 
between 8-12 PSU's and 1-3 strata were not represented at all. For illustration, the 
nesting design for one of these datasets is given in Table 5; most PSl "s are represented 
by fewer than 10 students, although 5 schools have 10 or more students in the dataset. 

TABLE 5 HERE 

The proficiency scores were generated by the variance component model (4) with the 
variances ct^ = 1., cr^ = 0.12, ct^ = 0.03 and oi = 0, and mean ix = 0. Results of the 
simulation study using 200 replicates are given in Table 6. Table 6 has the same format 
as Table 4, but in addition it contains summary statistics for the jackknife and variance 
component estimators of sampling variance (see (5) below). We see that the mean 
squared error (M.S.E.) is biased and the jackknife estimate of the standard error agrees 
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with the sampling standard deviation of both the jackknife and the ordinary mean 
estimators. The variance component estimator of the mean is only marginally more 
eflBcient. 

The variance component estimator of the standard error for the mean is more biased 
than the jackknife estimate, but its sampling standard deviation is about twice as small 
as that of the jackknife. 

TABLE 6 HERE 

The number of degrees of freedom associated with the estimators of the standard 
error can be estimated by the formula 

2(estimate of the variance^^ 

sampling variance of the squared standard error (5) 

derived by matching the moments of the distribution. These value are given in Table 
6. The variance component estimator of the standard error has 35-40 more degrees of 
freedom than its jackknife counterpart. Note that in the jackknife 31 degrees of freedom 
is the upper limit for any data set. 

We conclude that in small and lumpy datasets the variance component estimator of 
the standard error for the mean is likely to be much more efficient than the jackknife 
estimator probably even in presence of features that to a moderate extent violate the 
assumptions of the variance component model. 
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4. Computational details 

In the jackknife analysis we define a pseudosample corresponding to each stratum, 
and carry out the 'basic' analysis for each sample. In our case the basic analysis consists 
of calculation of the weighted mean (2), and the pseudosample for stratum h is generated 
from the original data set by deletion of the first PSU of the stratum and replacing it with 
the second PSU of the stratum, with unaltered sampling weights. The sampling weights 
are then adjusted for poststratification. Let the weighted mean of the pseudosample h 
be y^, and the weighted mean for the original data set y. The jackknife estimator for 
the mean (1) is given by 

7 = 5; {Hy, . (H - l)y} 

(H is the number of strata, 32), and the sampling variance of this estimator is estimated 
by (3). For further details we refer the reader to Beaton et al. (1988), Ch. 14.2. 

The Fisher scoring algorithm for variance component analysis requires formulae for 
the Jacobian and the expectation of the Hessian associated with the estimated 
parameters. For easier description we consider first the case of equal weights. The 
log-likelihood for a set of observations with equal sampling weights is given, apart from 
an additive constant, by the formula 

-21og k = log det(V) + e'^V^e, 

where V is the variance matrix for the observations and e = y - /it is the vector of 
residuals. The determinant and the inverse of the variance matrix can be evaluated 
efficiently, and without numerical inversion of any matrices by the recursive algorithm 
described in Longford (1987) where the formulae for the Jacobian and Hessian are 
derived. Details are given in Appendix. The computational procedure is based on the 
counts of students within schools, the within-school totals of proficiencies and the sample 
total of squares of proficiencies. The weighted version of the algorithm uses the same 
formulae, with the counts of students replaced by totals of weights, the totals of 
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proficiencies by the corresponding weiphted totals and the sum of squares of proficiencies 
by the weighted sum of squares. The sampling weights are normalized (multiplied by a 
constant) so that the sample total of the normalized weights is equal to the number of 
students in the sample. 

The adopted parametrization has the advantage that the estimate of the elementary 
variance ct^ is obtained at each iteration by setting the Jacobian to zero: 

2 a(log k)/dal = - n/o^ + e'^WVc^t = 0, 

where W = a^^V. Note that in the r-parametrization W does not depend on . 
Instead of the variance ratios 4 and their respective square roots, Tj, Tj and t^, are 
estimated; The Jacobian and Hessian are adjusted by the chain rule. The main 
advantage of estimating (ratios of) standard deviations instead of (ratios of) variances is 
that negative estimates of the variances are avoided. Also, the standard errors obtained 
from the inverse of the estimated expected information matrix are easier to interpret 
because negative standard deviations in a confidence interval correspond to positive 
variances. 

In the model (4) the constant fi can be replaced by a linear predictor, such as one 
allowing different within-stratum means. In general, addition of an explanatory variable 
will reduce the variance components (or leave them unchanged). In the model with 
random strata a variable defined for strata will leave the PSU-, school- and student-level 
variances unchanged, and can reduce only stratum-level variance. The stratum factor 
(categorical variable with 32 categories) will saturate the stratum-level variance, and will 
result in a zero stratum-level variance component. Thus the overall mean in the model 
(4) with fixed stratum-differences can be estimated by applying the model (4) with 
random stratum-differences, and then setting the stratum variance to zero. Direct 
estimation of the 32 stratum-means by the Fisher scoring method would involve iterative 
inversion of 32 x 32 matrices, a substantial burden compared to the estimation of the 
variance components. As an alternative the ordinary within-stratum means could be 
imputed for them in variance component estimation. 

ERIC 
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5. Summary 

The reported study has demonstrated that the computationally extensive procedures 
based on the jackknife method can be replaced by variance component methods which 
do not involve resampling. The differences between the jackknife and variance 
component-based estimates of the (sub-) population means are of no practical importance 
since they are comparable to rounding errors. For small data sets (with 1000 or fewer 
subjects) the jackknife and the variance component-based estimates of the standard errors 
for the corresponding estimates of population means are almost identical, but for 
subpopulation means the jackknife standard errors are substantially less efficient than 
their variance component counterparts. For larger data sets some differences arise, but 
they cause no noticeable changes in the Summary Tables. The efficiency of the jackknife 
and variance component methods can be compared only on simulated data sets, such as 
those generated by a variance component model. The simulation study described in 
Section 3 provides evidence that the jackknife estimator for the standard error of the 
estimate of the mean is substantially less efficient than its variance component 
counterpart. Small proportion of this loss can be attributed to the difference in efficiency 
of the jackknife and variance component estimators of the mean. The ultimate decision 
to use variance component methods should be based on the predicted (guessed) impact 
of the features of the data not accounted for by the variance component models, such as 
the nature of the poststratified sampling weights, and possible variation of the variance 
components across the strata. The impact of these features in the analyses of the studied 
data sets appears to be ignorable, but only a study extended to the entire variety of 
dataset designs involved in NAEP could arbitrate whether these features can be ignored 
throughout, and the jackknife resampling weights made obsolete. The gain in efficiency 
by using variance component analysis is most striking in small *lumpy* datasets because 
the jackknife estimator of the standard error ignores the within-PSU information. 

The additional information provided by the variance component analysis consists of 
the estimates of the variance components. The school-level variance is of primary 
interest; it provides a description of school heterogeneity. Future alterations in the 
design of the entire survey could be easier to plan, with optimality of inference as a goal, 
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and some results, such as the standard errors for population means, could be predicted 
prior to data collection using past (or imputed) values of the variance components. 
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Fisher scoring algorithm for variance component estimation 

Suppose the observations for the subjects (students) are in lexicographic order, so 
that their variance matrix V is block-diagonal and equal to 

V = a?W = (7^(1 + r|r(^) + rir(^) + r^jW) , (A.1) 

where all the matrices have sizes n x n, I is the unit matrix and p = 2,3,4, are the 
respective incidence matrices for schools, PSU's and strata; the element (r^, r2) of J^^ is 
equal to 1 if the students r^ and T2 belong to the same unit at the level p, and is equal to 
0 otherwise. Fixed stratum-differences correspond to 74=0. Let 1^-] , 1^^^ and 1 
denote the respective n x 1 indicator vectors for a school, a PSU, a stratum, and for the 
whole sample, so that 

1(1) = 1^ . l^'') = 5; l^f . and 1 = 5; (A.2) 

In order to simplify and streamline the notation we will use the symbol Jl^p^ for the 
summation over all units at the level p, and the dot notation; for example, 

j(p) = X) ll^^l^P^^. 
(p) * * * 

The log-likelihood for the model (4) is given by the formula 

-21og X = nlogcT? + logdet(W) + e^W^e/o\ {A3) 
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where e = y - is the vector of residuals, i.e., the differences of the observed and model 
values. The vector /x may vary across the observations, e.g., related to a set of 
explanatory variables by a linear formula, fi = X/3, where X is a design matrix of known 
constants, and /5 a vector of (known or unknown) location parameters. In most cases we 
consider the case of constant predictor fi { = /x 1) or stratum-specific means /x^. 

The estimate of ihe elementary-level variance is obtained by setting the derivative 
of (A.3) with respect to to zero: 

n/of - e'^W^e/fft = 0, 

which has, for a given W, the unique solution = e'^W^e/n. 

The first partial derivatives with respect to Tp, p =2,3,4, are equal to 

a(log X)/arl = -1/^ E {1(P)V4(P) - (e'^Whi^^f/oih (A.4a) 
(p) 

and the expectations of the second derivatives (q > p) are 

E{a^IogX/dr2dr2} = -i/^ tr{j(P>W^j('>)W^} = E E (l[^^'^W'm\ (A.4b) 

(q) (p|q) " 

where the double summation is over all units f at the level q and all its subunits g at the 
level p. 

The maximum likelihood estimator for the regression parameters is given by the 
generalized least squares formula 

^ = (X'^W^X)-^ X'^WV. 

where X is the (regression) design matrix, E(y) = X^. For the case of no explanatory 
variables we have the estimate of the grand mt n 

fi = (l^W'iy l^wV (A.5) 

2i 
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with the information o] l^W^l. 

Since W is a matrix of large size it is important to have efficient algorithms for 
computation of expressions involving We define = I and 

Wp - W^, + rJjCP), 

p = 2,3,4, so that = W. For the inverses of these matrices we have the recursive 
formula 

w-p' = w;!, - w-;, E {i^p) il^)'' tI/{i + rl li^^^.ii^^)} yr' . (a.6) 
(p) 

We define 

c(p) = i(p)^w;!ji(p) 

and 

E(p) = e^W^il(P). (A.7) 

where the dot . stands for a unit at the level p, and 

C = l^W^l, E = e^W^l. 

We have C[^^ = n^^ (number of students from school • in the sample), and 

Ehij = Chijic (sum of the residuals within school hij). The inversion formula (A.6) 

implies the identities 

D^f =5^D(?j/(l + ^C(?j). 
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and 



D = 5;Dr/(i + Tic(^)), 
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(A.8) 



where D stands for either C or E. 
It is easy to show that 

4 

logdet(W) = E E log(l + tW) (a.9) 

P = 2 (p) ^ 

and 

4 ■ 

eTW-^e=E E {E(p>}'t?/(1 + T^C^P)). (A.IO) 

P=2 (p) ^ ^ 

These formulae enable efficient calculation of the log-likelihood (A.3). All the quadratic 
forms required for (A.4a), (A.4b) and (A 5) can be calculated directly from the constants 
C^P) and E^P^ using (A.6). The partial derivatives with respect to the square roots of the 
variances are calculated from (A.4a) and (A.4b) using the chain rule. 

The Fisher scoring algorithm is an iterative procedure, and as such it requires initial 
values for all the estimated parameters. For the regression parameters (the population 
mean) the ordinary least squares (arithmetic mean) provides a suitable initial solution, 
and for the variance components any non-iterative procedure which provides positive 
values is suitable. We have used a naive moment estimate which in the models fitted 
turned out to have values between 50-200% of the maximum likelihood estimate. The 
Fisher scoring algorithm required between 6-12 iterations. The iterations were 
terminated when both the change of the -2 log-likelihood and of each were smaller 
than IQ-^. 

The estimator of the sampling variance of the arithmetic mean under the variance 
component model (VC.GM. in Table 4) is equal to I'^Vl/n^ its evaluation is 
straightforward using (A.l). 

or; 
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Adaptation for sampling weights. 

Formally, the variance matrix V in (A.1) can be replaced by 

V = (T^H^nVH^ 

where H is a diagonal matrix of sampling weights. All the formulae (A.4) - (A.8) carry 
over directly after redefining the within-school scalars in (A.7) as 

and 

E(5 = 5;e,ij,H,y,. (A.11) 

An iteration of the algorithm starts with the scalars (A.11) from which the level-3 and 
level-4 totals C^^^' E^^\ C^^^ and as well as sample totals C and E are calculated using 
(A.8). From these scalars the items for the Jacobian and Hessian of the Fisher scoring 
algorithm are calculated using (A.6). 
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Table 1. The data sets used in the study, with numbers of students and schools. 



Item 


Response 


otuaents 


bcnools 


6 


B 


309 


225 


9 


C 


560 


299 


5 


E 


1,038 


325 


8 


A 


1,312 


341 


4 


A 


2,240 


379 


All Students 


3.076 


392 



2b 
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Table 2. Comparison of the jackknife and variance component estimates. 

The means from variance component analysis are quoted only for model A.; 
for model B. they differ by l^ss than .0005. 



Item- 
Response 


JACKKNIFE 


VARIANCE COMPONENT 


Mean 


St. Error 


Mean 


St. Error 


Au 


B. 


6 


B 


250.76 


1.694 


250.98 


1.685 


1.685 


9 


C 


257.19 


1.768 


257.22 


1.730 


1.730 


5 


E 


264.82 


1.265 


264.39 


1.272 


1.664 


8 


A 


252.42 


1.252 


252.08 


1.241 


1.388 


4 


A 


254.81 


1.263 


255.12 


1.299 


1.630 


All Students 


253.38 


1.046 


253.51 


1.154 


1.350 



Key: A. - fixed stratum differences 

B. - random stratum differences 
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Table 3. Estimates of the variance components. 

The first Hne in every cell contains the estimate of the variance, the second 
line the square root of this estimate, and the third line (in parentheses) the 
standard error for the square root. 



Case 


(students) 


^2 

(schools) 


(PSU's) 


(strata) 




9.83 


.1428 


.0000 


.0000 


6B 


3.136 


.3379 


.0000 


.0000 






(.0736) 


(.1921) 


(.1919) 




10.28 


.2203 


.0083 


.0000 


9C 


3.206 


.4693 


.0909 


.0000 






(.0757) 


(.2074) 


(.1925) 




10 45 


1575 


.uuuu 


•UZo/ 


5E 


3.237 


.3968 


.0000 


.1693 






( 05 12^^ 




^.UO^*t ) 




10.39 


.1414 


.0076 


.0099 


8A 


3.223 


.3761 


.0870 


.0995 






(.0483) 


(.1480) 


(.0959) 




10.94 


.1457 


.0332 


.0286 


4A 


3.308 


.3817 


.1823 


.1692 






(.0483) 


(.1480) 


(.0959) 




11.55 


.1223 


.0234 


.0136 


All 
students 


3.398 


.3497 


.1530 


.1168 






(.0319) 


(.0628) 


(.0672) 
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Table 4. Results of the simulation study for the 'all students' and the 4A design. 

Sampling means and standard deviations for various estimators of the mean 
and of the standard error (200 replicates). 





All students 


4A 


Estimator 


Mean 
Estimate 


Sampling 
St. Dev. 


Mean 
Estimate 


Sampling 
St. Dev. 


G.M. 


-.00448 


.03706 


-.00670 


.0386 


M.S.E. 


.01979 


.00029 


.02321 


.00038 


V.C.M. 


-.00597 


.03392 


-.00625 


.03631 


VC.SE. 


.03599 


.00364 


.03848 


.00428 


VC.GM. 


.03851 


.00375 


.04059 


.00446 


JK.M. 


-.00436 


.03725 


-.00651 


.03895 


JK.SE. 


.03762 


.00543 


.03818 


.00544 



Key: 

G.M. . . . ordinary (arithmetic) mean 

M.S.E. . . . square root of the mean squared deviation from G.M. (the simple 
random sampling estimator of the standard error) 



V.C.M. . . . variance component (ML) estimator of the mean 
VC.SE. . . . estimator of the asymptotic standard error of V.C.M. 

VC.GM. ... the estimator of the sampling standard deviation of G.M. given the 
variance component model (4), A. 

JK.M, , , , the jackknife estimator of the mean 
JK.SE. ... the jackknife estimator of the standard error for the mean 
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Tabic; 5. Simulated sampling design of a 'lumpy' dataset. 

Counts of students within schools (368 students from 98 schools). For 
example, the second PSU of the stratum 2 has 3 schools in the dataset, with 
6 students in one school and one each in the other two schools. 



Stratum 


1 
1 

] 


First PSU 


Second PSU 


1 


5 






1 








2 


1 






6 


1 


1 




3 


6 


4 




9 


2 






4 


10 


1 


2 1 


6 








5 


1 


1 


1 


2 


2 


3 


1 


6 


5 






3 


4 






7 


5 


4 




17 








8 


4 


8 


17 


7 


2 


3 




9 


2 


2 












10 


7 


3 




1 


2 






11 


12 


9 


2 


3 


5 






12 


1 


2 




2 








13 


5 






4 


6 






14 


7 






1 








15 


4 


9 


. 4 


1 








16 


5 














17 


2 


4 




4 








18 


4 


1 




9 








19 


3 


2 


2 


4 


2 






20 


2 






1 


1 


1 




21 


1 






2 








22 








1 


6 






23 


3 


9 




3 








24 


10 














25 


1 


7 


1 


1 


8 






26 








1 


1 






27 


3 


3 




3 


2 






28 








2 








29 


1 


5 


1 










30 








1 


5 
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Table 6. Results of the simulation study for 'lumpy' data. 

Sampling means and standard deviations for various estimators of the mean 
and of the standard errors (200 replicates). 



Estimator 


368 Students 
98 Schools 


329 Students 
109 Schools 


284 Students 
85 Schools 


Mean 
Estimate 


Sampling 
St. Dev. 


Mean 
Estimate 


Sampling 
St. Dev. 


Mean 
Estimate 


Sampling 
St. Dev. 


G.M. 
M.S.E. 


.00273 
.05601 


.07478 
.00215 


.00966 
.05913 


.07975 
.00237 


-.00007 
.06358 


.07892 
.00268 


V.C.M. 
VC.SE. 
VC.S2. 


-.00078 
.07301 
.00537 


.07110 
.00649 
.00096 


.00838 
.07480 
.00565 


.07467 
.00754 
.00114 


-.00048 
.08159 
.00672 


.07599 
.00768 
.00127 


VC.GM. 


.07740 


.00885 


.08058 


.01077 


.08636 


.01034 


JK.M. 
JK.SE. 
JICS2 


.00268 
.07603 
.00598 


.07498 
.01396 
.00223 


.00993 
.07910 
.00656 


.07923 
.01751 
.00315 


.00022 
.08286 
.00703 


.07930 
.01283 
.00220 


JICDF. 
VC.DF. 


14.4 
62.2 


8.7 
48.7 


20 
56 


.4 
.3 



Key: See Table 4, and: 

VC.S2 . . . estimator of the asymptotic variance of V.C.M. 

JK*S2. . . , the jackknife estimator of the variance of JK.M. 

JK.DF. ... the estimated number of degrees of freedom of the jackkm'fe estimator 
of the sampling variance 

VC.DF ... the estimated number of degrees of freedom of the variance 
component estimator of the sampling variance 
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